首页> 外文OA文献 >Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
【2h】

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

机译:基于maTLaB的模型深度强化学习的神经网络动力学   无模型微调

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Model-free deep reinforcement learning algorithms have been shown to becapable of learning a wide range of robotic skills, but typically require avery large number of samples to achieve good performance. Model-basedalgorithms, in principle, can provide for much more efficient learning, buthave proven difficult to extend to expressive, high-capacity models such asdeep neural networks. In this work, we demonstrate that medium-sized neuralnetwork models can in fact be combined with model predictive control (MPC) toachieve excellent sample complexity in a model-based reinforcement learningalgorithm, producing stable and plausible gaits to accomplish various complexlocomotion tasks. We also propose using deep neural network dynamics models toinitialize a model-free learner, in order to combine the sample efficiency ofmodel-based approaches with the high task-specific performance of model-freemethods. We empirically demonstrate on MuJoCo locomotion tasks that our puremodel-based approach trained on just random action data can follow arbitrarytrajectories with excellent sample efficiency, and that our hybrid algorithmcan accelerate model-free learning on high-speed benchmark tasks, achievingsample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents.Videos can be found at https://sites.google.com/view/mbmf
机译:事实证明,无模型的深度强化学习算法能够学习各种机器人技能,但通常需要大量样本才能获得良好的性能。原则上,基于模型的算法可以提供更有效的学习,但是事实证明,很难将其扩展到具有表达能力的高容量模型,例如深度神经网络。在这项工作中,我们证明了中型神经网络模型实际上可以与模型预测控制(MPC)相结合,从而在基于模型的强化学习算法中实现出色的样本复杂性,从而产生稳定而合理的步态来完成各种复杂的运动任务。我们还建议使用深度神经网络动力学模型来初始化无模型学习者,以将基于模型的方法的样本效率与无模型方法的高特定任务性能相结合。我们在MuJoCo运动任务上进行了经验证明,我们基于随机行为数据训练的基于纯模型的方法可以遵循任意轨迹,并具有出色的样本效率,并且我们的混合算法可以加快高速基准任务的无模型学习速度,使样本效率提高3-游泳者,猎豹,跳跃者和蚂蚁特工的5倍视频。可在https://sites.google.com/view/mbmf上找到视频。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号